- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources3
- Resource Type
-
0003000000000000
- More
- Availability
-
30
- Author / Contributor
- Filter by Author / Creator
-
-
Xu, Yi (3)
-
Zeng, Belinda (3)
-
Cheng, Guang (2)
-
Lin, Xiaofeng (2)
-
Song, Qifan (2)
-
Xing, Yue (2)
-
Anantharaman, Aditya (1)
-
Chen, Changyou (1)
-
Chen, Yiran (1)
-
Chilimbi, Trishul (1)
-
Cui, Qingjun (1)
-
Muhamed, Aashiq (1)
-
Wang, Guoyin (1)
-
Zhang, Jianyi (1)
-
Zhong, Kai (1)
-
#Tyler Phillips, Kenneth E. (0)
-
#Willis, Ciara (0)
-
& Abreu-Ramos, E. D. (0)
-
& Abramson, C. I. (0)
-
& Abreu-Ramos, E. D. (0)
-
- Filter by Editor
-
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
& Chen, B. (0)
-
& Chen, Bodong (0)
-
& Drown, S. (0)
-
& Ferretti, F. (0)
-
& Higgins, A. (0)
-
& J. Peters (0)
-
& Kali, Y. (0)
-
& Ruiz-Arias, P.M. (0)
-
& S. Spitzer (0)
-
& Sahin. I. (0)
-
& Spitzer, S. (0)
-
& Spitzer, S.M. (0)
-
(submitted - in Review for IEEE ICASSP-2024) (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Xing, Yue; Lin, Xiaofeng; Song, Qifan; Xu, Yi; Zeng, Belinda; Cheng, Guang (, The 27th International Conference on Artificial Intelligence and Statistics)
-
Zhang, Jianyi; Muhamed, Aashiq; Anantharaman, Aditya; Wang, Guoyin; Chen, Changyou; Zhong, Kai; Cui, Qingjun; Xu, Yi; Zeng, Belinda; Chilimbi, Trishul; et al (, The 61st Annual Meeting of the Association for Computational Linguistics)Knowledge Distillation (KD) (Hinton et al., 2015) is one of the most effective approaches for deploying large-scale pre-trained language models in low-latency environments by transferring the knowledge contained in the largescale models to smaller student models. Previous KD approaches use the soft labels and intermediate activations generated by the teacher to transfer knowledge to the student model parameters alone. In this paper, we show that having access to non-parametric memory in the form of a knowledge base with the teacher’s soft labels and predictions can further enhance student capacity and improve generalization. To enable the student to retrieve from the knowledge base effectively, we propose a new Retrieval-augmented KD framework with a loss function that aligns the relational knowledge in teacher and student embedding spaces. We show through extensive experiments that our retrieval mechanism can achieve state-of-the-art performance for taskspecific knowledge distillation on the GLUE benchmark (Wang et al., 2018a).more » « less
An official website of the United States government

Full Text Available